Tensor product of correlated text and visual features: a quantum theory inspired image retrieval framework
نویسندگان
چکیده
In multimedia information retrieval, where a document may contain textual and visual content features, the ranking of documents is often computed by heuristically combining the feature spaces of different media types or combining the ranking scores computed independently from different feature spaces. In this paper, we propose a principled approach inspired by Quantum Theory. Specifically, we propose a tensor product based model aiming to represent text and visual content features of an image as a non-separable composite system. The ranking scores of the images are then computed in the form of a quantum measurement. In addition, the correlations between features of different media types are incorporated in the framework. Experiments on ImageClef2007 show a promising performance of the tensor based approach. Introduction With rapidly increasing volume of digital image data, e.g. in specialised image repositories, social photo sharing sites and all sorts of multimedia documents on the Web, an effective search for images that satisfy users’ information needs is becoming a challenging research topic. In the early stage of image retrieval research, librarians had to attach some keywords to each image in order to retrieve relevant images with text retrieval techniques. Nowadays, however, manual labelling becomes infeasible due to the increasing size of the image collections. To circumvent such obstacle, content-based image retrieval (CBIR), which uses visual features to measure the content similarity between images, has been investigated. Typical visual features include colour histogram, texture and shape, etc. An image is represented as a vector in a feature space. For example, each dimension in a colour histogram space corresponds to a color bin along channels R-G-B or H-S-V, and the value of an image on each dimension is the normalized number of pixels in the image falling into the corresponding bin. The similarity between two images can be measured based on how close their corresponding vectors are on the feature space, e.g. through the Cosine function. Nevertheless, even the start-of-art CBIR techniques can only achieve Copyright c © 2010, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. a limited performance because of the semantic gap between the content and its high level semantics. Given that more and more images and multimedia documents contain both visual content and certain amount of text annotations (e.g. tags, metadata, text descriptions, etc.), combining the textual and visual features of images for image retrieval has recently attracted increasing attention. Three commonly adopted combination methods are: 1) using textual data to retrieve images, then re-ranking the retrieval results with their visual feature (Yanai 2003); 2) or using visual feature to retrieve images, then re-ranking the results with their textual features (Tjondronegoro et al. 2005); 3) or combining linearly the feature spaces or the similarity scores based on different features (Rahman, Bhattacharya, and Desai 2009)(Matthew Simpson 2009)(Min 2004). All these combination methods treat the textual and visual features of images individually, and combine them in a rather heuristic manner. Therefore it is difficult to capture the relationship between them. Indeed, as both the textual and visual features describe the same image, there are inherent correlations between them and they should be incorporated into the retrieval process as a whole in a more principled way. In this paper, we present a Quantum Theory inspired retrieval model based on the tensor product of the textual and visual features. It describes an annotated image as a n-order tensor in order to catch the non-separability of textual and visual features. The order of the tensor depends on the visual features that are going to be incorporated in the image retrieval. Currently we focus on 2nd-order tensor. In practice, not every image is associated with a proper textual annotation: some annotations do not describe the content of the image at all and some images do not even contain any textual information. Ideally, the problem can be alleviated by automatically annotating images with controlled textual labels, usually through supervised learning from preannotated training examples, at the pre-processing stage. However, the automatic annotation is out of scope of this paper, and is an ongoing research topic on its own. Instead, in this paper, we are concerned about a finer-grained correlation between the dimensions across the textual and visual feature spaces. We present two rather straightforward statistical methods to associate dimensions (e.g. words) of the textual feature space with the dimensions (e.g. the HSV 109 Quantum Informatics for Cognitive, Social, and Semantic Processes: Papers from the AAAI Fall Symposium (FS-10-08)
منابع مشابه
Tensor Product of Correlated Textual and Visual Features: A Quantum Theory Inspired Image Retrieval Framework
In multimedia information retrieval, where a document may contain textual and visual content features, the ranking of documents is often computed by heuristically combining the feature spaces of different media types or combining the ranking scores computed independently from different feature spaces. In this paper, we propose a principled approach inspired by Quantum Theory. Specifically, we p...
متن کاملContextual Image Annotation via Projection and Quantum Theory Inspired Measurement for Integration of Text and Visual Features
Multimedia information retrieval suffers from the semantic gap, a difference between human perception and machine representation of images. In order to reduce the gap, a quantum theory inspired theoretical framework for integration of text and visual features has been proposed. This article is a followup work on this model. Previously, two relatively straightforward statistical approaches for m...
متن کاملImage retrieval using the combination of text-based and content-based algorithms
Image retrieval is an important research field which has received great attention in the last decades. In this paper, we present an approach for the image retrieval based on the combination of text-based and content-based features. For text-based features, keywords and for content-based features, color and texture features have been used. Query in this system contains some keywords and an input...
متن کاملRGU at ImageCLEF2010 Wikipedia Retrieval Task
This working notes paper describes our first participation in the ImageCLEF2010 Wikipedia Retrieval Task[1]. In this task, we mainly test our Quantum Theory inspired retrieval function on cross media retrieval. Instead of heuristically combining the ranking scores independently from different media types, we develop a tensor product based model to represent textual and visual content features o...
متن کاملCombining Visual and Textual Systems within the Context of User Feedback
It has been proven experimentally, that a combination of textual and visual representations can improve the retrieval performance ([20], [23]). It is due to the fact, that the textual and visual feature spaces often represent complementary yet correlated aspects of the same image, thus forming a composite system. In this paper, we present a model for the combination of visual and textual sub-sy...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012